Home > java > How to reduce execution time for this clustering computation?

# How to reduce execution time for this clustering computation?

January 13Hits:0

Is there any way to reduce total execution time for the function `getSilhouetteIndex`? P.S. I am using `weka SimpleKMeans` for getting kmeans and `ceval`.

````private double getSilhouetteIndex(List<ITSPoI> _POIs, SimpleKMeans kmeans, ClusterEvaluation ceval) {     double si_index = 0;     double[] ca = ceval.getClusterAssignments();     double[] d_arr = new double[ca.length];     List<Double> si_indexes = new ArrayList<Double>();      for (int i=0; i<ca.length; i++)     {         // STEP 1. Compute the average distance between the i-th PoI and all other points of a given cluster         double a = averageDist(_POIs,ca,i,1);          // STEP 2. Compute the average distance between the i-th PoI and all PoIs of other clusters         for (int j=0; j<ca.length; j++)         {             double d = averageDist(_POIs,ca,j,2);             d_arr[j] = d;         }          // STEP 3. Compute the the distance from the i-th PoI to its nearest cluster to which it does not belong         double b = d_arr[0];         for (Double _d : d_arr)         {             if (_d < b)                 b = _d;         }          // STEP 4. Compute the Silhouette index for the i-th PoI         double si = (b - a)/Math.max(a,b);         si_indexes.add(si);     }      // STEP 5. Compute the average index over all observations     double sum = 0;     for(Double _si : si_indexes)     {          sum += _si;     }     si_index = sum/si_indexes.size();      return si_index; }    private double averageDist(List<ITSPoI> _POIs, double[] ca, int id, int calc) {     double avgDist = 0;     List<ITSPoI> clusterPOIs = new ArrayList<ITSPoI>();      // Distances inside the cluster     if (calc == 1)     {         for (int i = 0; i<ca.length; i++)         {             if (ca[i] == ca[id])                 clusterPOIs.add(_POIs.get(i));         }     }     // Distances outside the cluster     else     {         for (int i = 0; i<ca.length; i++)         {             if (ca[i] != ca[id])                 clusterPOIs.add(_POIs.get(i));         }     }      double latx, lonx, laty, lony;     double[] dist = new double[clusterPOIs.size()];     latx = _POIs.get(id).getLat();     lonx = _POIs.get(id).getLon();     for (int i=0; i<clusterPOIs.size(); i++)     {         laty = clusterPOIs.get(i).getLat();         lony = clusterPOIs.get(i).getLon();         dist[i] = distanceGeo(latx,lonx,laty,lony);     }      double sum = 0;     for(Double d : dist)     {          sum += d;     }     avgDist = sum/dist.length;      return avgDist; }  private double distanceGeo(double lat1, double lon1, double lat2, double lon2) {     if (lat1 == lat2 && lon1 == lon2)     {         return 0;     }     else     {         double theta = lon1 - lon2;         double dist = Math.sin(deg2rad(lat1)) * Math.sin(deg2rad(lat2)) + Math.cos(deg2rad(lat1)) * Math.cos(deg2rad(lat2)) * Math.cos(deg2rad(theta));         dist = Math.acos(dist);         dist = rad2deg(dist);         dist = dist * 60 * 1.1515;         dist = dist * 1.609344;         return dist;     } } `
```

There are three methods for computing a great circle distance between two points that are specified by latitude and longitude:

1. You used the spherical law of cosines, which is the most straightforward method based on mathematical principles.
2. The haversine formula yields better accuracy for small distances, though at a greater computational cost.
3. An approximation can be obtained by using the Pythagorean theorem on an equirectangular projection.

For the purposes of clustering, an approximation of the great circle distance might be good enough. I would also declare this function `private static final` as a very strong hint that it can be inlined — as it should, since it's a pure mathematical function.

``````private static final double approxDistanceGeo(double lat1, double lon1, double lat2, double lon2)
{
if (lat1 == lat2 && lon1 == lon2)
{
return 0;
}
else
{
double x = deg2rad(lon1 - lon2) * Math.cos(deg2rad((lat1 + lat2) / 2));
double y = deg2rad(lat1 - lat2);
double dist = Math.sqrt(x * x + y * y);
dist = dist * 60 * 1.1515 * 1.609344;
return dist;
}
}
```
```

Working directly in radians could save a few `deg2rad()` conversions.

## Related Articles

• ### How to reduce execution time for this clustering computation?January 13

Is there any way to reduce total execution time for the function getSilhouetteIndex? P.S. I am using weka SimpleKMeans for getting kmeans and ceval. private double getSilhouetteIndex(List<ITSPoI> _POIs, SimpleKMeans kmeans, ClusterEvaluation ceval)

• ### Is it possible to install RHEV on el6 clustered computerJune 12

Is it possible to use Linux (el6) clustered computer can be used as a hypervisor for RHEV 3.3/Ovirt 3.3? Cause, we can not combine RAM/CPU from two or more hypervisores to be considered as a one source of resource while creating a huge guest on RHEV.

• ### Animating Map-Reduce executionFebruary 6

I'm trying to develop an animation program for Hadoop's (Map-Reduce) WordCount program using Java. When the user runs the application, a Java (Swing) Form opens up with a textarea where the user can enter the text and clicks the execute the button. T

• ### How to reduce execution time and compile time of this program?September 20

I have one programming question: The sine and cosine of x can be computed as follows. sin(x) = x - x3/3! + x5/5! - x7/7! - x9/9! + - cos(x) = 1 - x2/2! + x4/4! - x6/6! + x8/8! - - Your task is to compute the sine and cosine for given values of x (whe

• ### Reducing execution time for this gnarly SQLAugust 16

I'm creating an order system of some kind. There is a function that I use to get orders (in PHP) and retrieve the required information about the order. However, it seems that I am pulling a lot of information, including several left joins that could

• ### Reducing execution time of an HTML parsing scriptJuly 15

The script is intended to return an array with texts containing specific words in English and the equivalent texts in Polish from EUR-Lex - a website with EU documents. The script downloads the page which address is specified by the passed variable.

• ### Reducing system disk space usage on computer with SQL ServerNovember 20

The disk usage on my computer continues to rising with time. How can reduce his size without damaging programs and windows. The disk is SSD with 180 GB, because of that this is sensitive to me ! Biggest folders: Program Files (x86) -52.6 GB Windows -

• ### What's the most used programming language in high performance computing? And why? December 4

I believe a lot of Fortran is used in HPC, but not sure if that's only for legacy reasons. Features of modern programming languages like garbage collection or run-time polymorphism are not suitable for HPC since speed matters so not sure where C# or

• ### How can I reduce bandwidth of a computer in LAN? January 13

Possible Duplicate: Home Network Bandwidth Control The title might be confusing as I don't know the correct terminology. The situation is: I have two PCs connected directly to each other. One of the PC connects directly to internet and the other shar

• ### Taking photos of computer screens - reducing glare with S95February 6

I am using my S95 to take photographs of LCD computer screens which are used to display Microfilm images related to my research. Unfortunately they aren't running a normal OS, so it's not an option to take screen-shots. So far, it's working decently

• ### Pathfinding and clustersFebruary 22

For an assignment, I am required to implement an A* algorithm in order to find the shortest path between two object using different heuristics null, this effectively becomes Dijkstra Euclidean distance Clustering So I have successfully implemented th

• ### how to use several computers as a cluster?August 17

I have 7 computers running Gentoo Linux with Quad-Core processors and I want to be able do distribute the execution of a program to all of those machines. I have some multi-threaded programs and I wanted to use all the 28 available CPUs on my cluster

• ### Has Microsoft improved Scandisk, CHKDSK and Defrag in Windows 7?November 14

Is there only interface design change or something internally has changed? Most of the time I could see that Scandisk is unable to check OS partition as there are no rights. It schedules the task to be run at the time of Windows start. The ScanDisk t

• ### Good use case for Akka December 20

I have heard lots of raving about Akka framework (Java/Scala service platform), but so far have not seen many actual examples of use cases it would be good for. So I would be interested in hearing about things developers have used it succesfully. Onl

• ### When to create Indexes?January 22

I have a stored procedure that takes around 2 minutes to execute. The execution plan suggested me to create a non clustered index on a table (which is a high traffic table with millions of records in it, plus it gets a constant stream of data every s

• ### What can be done to further enhance performance of Multiple Join and Aggregate Queries?November 5

I have a typical star schema simulated here, and I am mentioning two queries: first query simply joins the fact table with 2 dimension tables and 1 calendar table, and the second query joins and aggregates. I have experimented and have created indexe

• ### Paging performance with customizable sorting over many millions of rowsDecember 31

In our application we have a grid where users can page over a large number of records (10-20 million). The grid supports sorting in ascending and descending order in a number of columns (20+). Many of the values are also not unique and so the applica

• ### Implement A star (A*) path algorithm in large map, low performanceJanuary 26

I'm using this A star (A*) Pathfinder.java to calculate & generate my route in an Android map app. https://github.com/xSmallDeadGuyx/SimpleAStar/blob/master/Pathfinder.java The size of the map is large, dimensions around 8000x8000, when I use the A s

• ### Would SSD drives benefit from a non-default allocation unit size?May 12

The default allocation unit size recommended when formatting a drive in our current set-up is 4096 bytes. I understand the basics of the pros and cons of larger and smaller sizes (performance boost vs. space preservation) but it seems the benefits of

• ### run local MySQL instance for fast mysql_real_escape_string callsJuly 15

Currently all MySQL data/API calls are handled by a remote DB cluster (i.e. network latency is a factor in total script execution time). To reduce execution time in this context, would it be sensible to run a local MySQL instance on each app server t