ABSTRACT
Objective:
The use of cross tables was frequently seen in early literature research in the biostatistics. Furthermore, its importance in many clinical examinations is still evident today. The aim of this study is to investigate how the 2x2 type tables are perceived in probability literature and how some studies are applied in practice. Thus, different methods can be developed for the purposes of applications.
Methods:
The method used to determine the distribution of a 2x2 type table is to consider one cell of a table as a random variable and calculate the probability that this variable can take the observed value. Hypergeometric distribution was taken into consideration in the study. This issue is explained in the methodology section of the study.
Results:
Some of the important statistics obtained from 2x2 type tables are the numerical statistical values that direct the researcher in experimental studies such as odds ratio. Considering the distribution of the table, the probabilities of these values are a very important finding for the experimental study. In particular, a high probability value is a measure of how well the statistical value commonly used in biostatistics applications, such as the odds ratio, represents the experimental study performed.
Conclusion:
According to the findings of the study, one of the observed results is the determination of the maximum probability ratio representing the experimental study, and the other is the weighted odds ratios that are used to combine odds ratios in the meta-analysis.
Introduction
One of the problems encountered in scientific research is the inadequacy of data. This can be due to the rarity of data, as well as the lack of time and cost or the lack of specialized personnel. For this reason, especially in health researches, clinical trials and studies are undertaken on a limited number of units. Sometimes, it is necessary to work with small samples for ethical purposes. In such a case, combining studies with similar characteristics by different researchers may make the study findings more meaningful. For these reasons, developing suitable combination methods is necessary.
The most striking example of this is the combination of odds ratios (βΟβ). Odds ratio combining methods in the literature are Mantel-Haenszel, Peto, General Variance, and DerSimonian-Laird methods. Detailed information on these methods can be found in Katz et al. (1) and Morris and Gardner (2). In these studies, important information is given about establishing confidence intervals of odds ratio. The normal distribution was used to establish the confidence interval. However, the condition of normal distribution may not always be possible. In this case, it is important to determine the distribution of odds ratio. No study in the literature has reported the distribution of odds ratio. However, a distribution that can be used in contingency tables has been examined by Patnaik (3) and Stevens (4). Studies of these researchers will be given with examples in the following sections. These examples were very useful in calculating the distribution of odds ratio. The distribution of odds ratio will be shown in the example in the Results section of our study. In addition, the distribution of combined odds ratios will be calculated in real data application.
Some Probabilistic Notes on Contingency Tables
In biostatistics, the statistical methods which are frequently used in both retrospective and prospective studies are based on statistics such as relative risk and odds ratio obtained from the information in Table 1. Therefore, it is very important to examine the probabilistic features of this table.
As retrospective study is limited to observed data, experimental values are fixed. However, it does not mean that it cannot vary depending on the retrospective follow-up period or other reasons. Thus, the value of βaβ in the table is the observation value of ββXββ―1βββ. The same applies to the control group. The probability βPrβ{βXββ―1βββ=βa}ββ can be calculated by the ratio of desired states to all possible states, as in the hypergeometric probability. The number of possible states is written as follows,
βββ― Nβ!
_______________
aβ!bβ!cβ!dβ!β―ββ
The number of desired states can be calculated as follows,
We have
where βmaxβ{0, mβ +β rβ ββ N}βββ€ββXββ―1ββββ€βminβ{m, r}ββ.
Sample 1. Consider the data in Table 2
In this example, the variable ββXββ―1βββ takes values between β0ββ€ββXββ―1ββββ€β10β. Let us show the possible states and probabilities of variable ββXββ―1βββ in Table 3.
According to Table 3, the probability that ββXββ―1 takes the value of 5 is the highest probability. The graph of the probability values in Table 3 is as follows,
Methodology
In literature, the first study about this probability belongs to P. B. Patnaik in 1948. In the study, the common cell of the case and the positive effect was accepted as a random variable, and it was shown by P. B. Patnaik that it has a hypergeometric distribution. This makes it easier to obtain the term representing the odds ratio from the conditional probability of the hypergeometric distribution. Therefore, hypergeometric distribution was taken into consideration in the study. Patnaik calculated the mean and variance of the distribution with the help of the hypergeometric distribution as βEβ βXββ―1βββ=βmrβ/βNβ and βVarβ βXββ―1βββ=βmnrsβ/βββNβββ―2ββ(ββNβ ββ 1β)ββββ [3]. The mean βEβ βXββ―1βββ calculated by Patnaik is used as the expected value of the cells in the chi-square relationship test. This was followed by W. L. Stevens. Stevens assumed the conditional probability of the variable ββXββ―1βββ as a function of βaβ under the condition that all marginal totals are known. As follows [4],
β
where Ο =adβ/βbcβ is odds ratio. The conditional probability mentioned above can be obtained as the multiplication of two binomial probabilities,
βPrβ{βXββ―1βββ=βaβ|ββXββ―1βββ +β βXββ―2βββ=βr,β m}ββ=βPrβ{βXββ―1βββ=βa}βPrβ{βXββ―2βββββ=βc}β β
β
where ββpββ―1βββ and ββpββ―2βββ are the probability of success in case and control groups, respectively. In addition, the ratio ββpββ―1βββ/ββpββ―2βββ is called relative risk. As a result, this equation ensures that conditional probability can be written as a function of βaβ. This is an important result for β2x2β tables. If the observation value of variable ββXββ―1 is smaller than some values in the possible order, βΟβ will remain smaller than odds ratios of these values, otherwise vice versa. Using this feature, Jerome Cornfield formed confidence interval with β1β ββ Ξ±β probability for odds ratio in his study done in 1956 (5). Cornfield obtained the lower limit ββΟββ―1βββ for βΟβ from the solution of the following equation,
ββ
Similarly, he obtained the upper limit ββΟββ―2βββ for βΟβ from the solution of the following equation,
β
Thus, the confidence interval can be written as βPr{Ο 1ββ€ Ο β€ Ο 2} = 1 β Ξ±β.
Results and Discussion
Here, conditional probability is obtained as the multiplication of two binomial distributions by independent variables ββXββ―1βββ and ββXββ―2βββ. Then normal distribution test procedures can be used in hypothesis tests since the limit distributions ββββXββ―1βββ and ββXββ―2βββ approach normal distribution. However, this may be the case if the marginal totals are large enough. Otherwise, it may cause incorrect interpretations. It is more accurate to obtain the exact distribution and to test with nonparametric method when an exact test statistic for βΟβ is desired to be created. In order for the mean and variance of βΟβ to be real, it is sufficient for the cells to satisfy the conditions of βa < mβ and βc > 0β. In this case, it is necessary to obtain the conditional distribution of βΟβ depend on these conditions. Therefore, many researchers use the normal distribution approach. The conditional distribution can be obtained by dividing binomial probabilities to the probability of βPrβ{βββXββ―1 < m}ββ for ββββXββ―1βββ and to the probability of βPrβ{βXββ―2ββ > 0}ββ for ββXββ―2ββββββ. In the following example, we show the possible values and possibilities of βΟβ.
Sample 2. Let be the sample data as follows,
The multiplication probability table and the probability table of βΟβ can be formed with the data in Table 4 using the conditional probabilities of variables ββββXββ―1 and ββXββ―2ββββββ,
The graph of multiplication probabilities in Table 5 is as follows,
When the probability in Table 6 is taken into consideration, the variable is seen to have the highest probability at βΟβ = 4β. It is seen that odds ratio in the experimental data in Table 4 would take high probability value between 2 and 5 (βPrβ{2ββ€βΟββ€β5}ββ = 0.443β). Such probabilistic information can also be supported in statistical terms by creating rejection and acceptance zones from the distribution obtained at βΞ±β significance level. Moreover, the mean βEΟβ = 6.7133β obtained from the distribution is an important statistic for βΟβ. The graph of probabilities in Table 6 is as follows.
Table 6 shows the distribution of βΟβ. The distribution of βΟβ can be easily obtained when multiple tables are for the same ββββXββ―1βββ. Letβs assume that there are k tables of ββββXββ―1βββ. In this case, probabilities for each table are shown as follows,
βPrβ{βΟββ―jβββ=βu}ββ=ββpββ―jβββ(u)β, jβ = 1, β―β,βk.β
The distribution of all tables will be as follows,
β
Since the mean βE βis derived from the βkβ sample selected from the mass, it will be able to represent the mass ideally. Finally, the following sample about combined odds ratio are presented.
The following example table was taken from Afshari et al.(6). This meta-analysis study by Mahdi investigates the effect of opium and smoking on bladder cancer. Table 7 was created by considering only opium use. The distribution and expected value of odds ratio were obtained for each study. At the end of the table is the expected value of the combined odds ratio. The matlab program used in the calculation is attached.
Conclusion
In general, when we look at the studies in the field of biostatistics, a comprehensive and technically rich literature is emerging. This is due to the fact that many scientific techniques are combined with medical data gathered under biostatistics. A scientific technique needs not only an opinion, but also an interpretation. The interpretation to be made is usually attributed to the data. However, this interpretation is the common point of data and technique, which increases the scientific value of results obtained from data and importance of the technique used. Therefore, biostatistics studies are important studies that bring data and technique together. If the odds ratio value obtained from a data in Table 1 is smaller than 1, the factor decreases the risk of disease. If the odds ratio is equal to 1, the factor has no effect on the disease. If the odds ratio is bigger than 1, the factor increases the risk of the disease. Thus, ββXββ―1βββ is the most important variable to ensure a high odds ratio. Considering the coincidence of the value of ββXββ―1βββ, it is more important to know the maximum probability value. For this, the distribution of ββXββ―1βββ and its interpretation should be made. In the study, a data table of 2 Γ 2 type has been shown to have hypergeometric distribution when considered unconditionally. Depending on this distribution, the variable ββXββ―1βββ takes the maximum probability with value βββ(ββmβ +β 1β)βββ(ββnβ +β 1β)ββββ/βββ(ββNβ +β 2β)ββββ. In addition, this value is the maximum probability value of the odds ratio. This result is very important in terms of both data and theory. If data were not interpreted with the theoretical structure, then a conclusion will never be obtained. Similarly, obtaining the distribution of βΟβ is also important in terms of interpretation. When values obtained from different tables are combined in a probability distribution, the distribution of a single variable βΟβ can be obtained for all tables. This result is also very important for meta-analysis. The mean βEΟβ for a single table is so important for combined tables. Many methods have been presented to combine odds ratios in literature; however, no such method has been presented. The reason for this is that presented methods have the ease of calculation in terms of researchers. However, using probabilistic methods is more important for more optimal results. Finally, one point that should be taken into consideration is that if the number of case and control is sufficient in a 2x2 table, parametric methods can be used easily. An example would be the Mantel-Haenszel, Peto, General Variance, and DerSimonian-Laird methods. If the number of data is quite low, it is more appropriate to use probabilistic methods.