Computer Science Homework Help

Computer Science Homework Help. IT 632 Auburn University Main Campus Cluster Analysis Discussion Questions

In chapter 8 we focus on cluster analysis.  Therefore, after reading the chapter answer the following questions:

  1. What are the characteristics of data?
  2. Compare the difference in each of the following clustering types: prototype-based, density-based, graph-based.
  3. What is a scalable clustering algorithm?
  4. How do you choose the right algorithm?

Reply to at least two classmates’ responses by the end of the week. 

post from jazmine

What are the characteristics of data?

There are several characteristics of data that can influence cluster analyses. For example, high dimensionality reduces density of clusters (Tan et al., 2019). Large data sets also may not work well with clustering algorithms that are not scalable. Sparseness, noise in the data, and outliers also impact cluster analyses (2019). Finally, the scale of each variable particularly if the scales vary, the properties of the data space, and the types of variables (e.g. nominal, ordinal, discrete, continuous, etc.) are listed by Tan et al. (2019) as impacting cluster analyses.

Compare the difference in each of the following clustering types: prototype-based, density-based, graph-based.

Prototype-based clustering is when clusters are built based on the data point’s proximity to the prototype that defines the cluster (Tan et al., 2019). Graph-based clusters are comprised of interconnected objects; they are similar to prototype clustering in that they tend to be globular. Graph-based clusters tend to do well with irregular graphs/clusters (2019). Density-based clustering uses densely populated and sparsely populated areas to differentiate between clusters (dense areas) (Sehgal & Garg, 2014). Like graph-based clustering, density-based clustering works well with irregular and intertwined clusters. Unlike graph-based clustering, density-based does do well with noise and outliers.

What is a scalable clustering algorithm?

A scalable clustering algorithm is one that works well in increasing dimensions because it uses an appropriate amount of memory and takes an affordable amount of time. Many clustering algorithms only work well in small and medium spaces (Tan et al., 2019). Tan et al. (2019) mention two scalable clustering algorithms, CURE and BIRCH. These algorithms may use techniques to reduce computational and memory requirements such as sampling the data or first partitioning the data into disjoint sets. Other techniques include utilizing parallel and distributed computations, and summarization, which is to take one pass over the data then cluster based on the summaries (2019).

How do you choose the right algorithm?

In order to choose the right algorithm, you have to determine the best clustering technique based on shapes of the clusters, the distribution of the data, the densities of the clusters, and whether the clusters are well-separated (Tan et al., 2019). It is also important to determine is there is a relationship between the clusters and if the clusters only exist in subspaces (2019). Different clustering algorithms are suited for different data properties so for example, if a dataset has a lot of noise or many outliers, a suitable algorithm should be chosen that is flexible to these characteristics of the data. The number of data points, number of attributes, and characteristics of the data should also be considered (2019).

post from santhosh:

  1. What are the characteristics of data?

Data can be in any form of information: Data is any form of information that has been acquired and organized sensibly. Data, thus, are known facts, each of which carries an implied meaning.

Data definition and characteristics are a very important database topic, and you should know at least a basic understanding of it.

There are five qualitative data characteristics.

When you have data of varying quality, do not think that all of it is of value. Data must be of high quality to yield an optimal return. To have this work, you must have specific traits in the data. These:

Data should be exact, providing facts that are accurate and reliable. Precision saves time and money and time. Your due diligence to make sure the data you are using is credible. Dependable and consistent data Falsified data is worse than having no data at all.

  1. Compare the difference in each of the following clustering types: prototype-based, density-based, graph-based.

Grouping the data objects based on the information found in the data that describes the objects and their relationships. The goal of clustering is creating groups such that the objects within a group be like one another and different from the objects in other groups. The greater the similarity within a group and the greater the difference between groups, the better the clustering quality.

For data to be of high quality and be valuable, it must be relevant. However, in today’s dynamic data-filled world, even all necessary information is not constantly updated. Data that is tailor-made to the individual user’s requirements is exemplary. Additionally, the compound is processable, making it convenient for application.

When prototype-based, density-based, and graph-based clustering is considered, there are significant differences. There are different ways to organize data according to the data objects’ description of them and their relationships. The purpose of grouping data objects is to create distinct sets of things yet similar; as similarity within a group increases, the more significant the differences between groupings (Mingxiao, et. al 2017)

  1. What is a scalable clustering algorithm?

To process data clustering, we must do the following: collect data samples, group the data, and then fine-tune the clusters. We demonstrate that a uniform selection from the original data results in a highly representative subset in the first step. We limit ourselves to several popular parametric techniques for simplicity. We follow this with the recommendation that customers use clustering and refinement algorithms. Generalizing the issue of long-term marriages, the clustering algorithm can be described as a stable marriage solution, whereas refining with constraints is an iterative relocation strategy with regulations. Other balanced clustering techniques have an approach complexity of O(kN log N), which is similar to the approach complexity of the entire approach. When we compare the performance of the unconstrained clustering method with the proposed framework, the framework performs about the same or better. The framework has been tested on many datasets, including those with high-dimensional features (i.e., more than 20,000).

  1. How do you choose the right algorithm?

To have a better understanding of the situation, lots of data is required. Nonetheless, data availability is frequently a challenge. When the training data is short, or the dataset has fewer observations and more features like genetics or text data, use algorithms with low bias and high variances like Linear regression, Naive Bayes, or Linear SVM.

Low bias/high variance methods like KNN, Decision trees, or kernel SVM can be used when the training dataset contains many observations, and the observation count exceeds the feature count.

An accurate model will, with few exceptions, be able to accurately predict the response for an observation that is closer to the actual answer. Interpretable techniques, such as Linear Regression, have high interpretability because each predictor can be easily understood. Flexible models, such as LASSO, offer greater accuracy but at the expense of being interpretable.

Computer Science Homework Help

 
"Our Prices Start at $11.99. As Our First Client, Use Coupon Code GET15 to claim 15% Discount This Month!!"