Investigating Blobstore and Repository Size and Space Usage

When working with Repositories and Blobstores, you may want to have some insight into how the storage space is being used.

This article provides you with a few different ways to find out where repository and blobstore space is being consumed.

Listing the Size of File-based Repositories and Blobstores

For file-based blobstores (not AWS S3), it is possible to run a script that will provide some information regarding the disk space that it being used by each Blobstore and Repository.

The groovy script to run this report can be found here: nx-blob-repo-space-report.groovy

The script can be executed as a task in Nexus Repository Manager. In the Administration pane select System > Tasks. Create a new Execute Script task. Set the Language to groovy and task frequency to Manual, copying the Source from the above link and pasting it into the provided textbox.

When you execute the task, the output within nexus.log will look similar to the following (the directories scanned will differ):

*SYSTEM Script47 - Blob Storage scan STARTED.
*SYSTEM Script47 - Scanning /home/nexus/sonatype-work/nexus3/blobs/default
*SYSTEM Script47 - Scanning /opt/nexus/test2
*SYSTEM Script47 - Scanning /home/nexus/sonatype-work/nexus3/blobs/test1
*SYSTEM Script47 - Blob Storage scan ENDED. Report at /home/nexus/sonatype-work/nexus3/tmp/repoSizes-20181213-104154.json

You should be able to find the generated JSON report at the location provided in the log:

Report at /home/nexus/sonatype-work/nexus3/tmp/repoSizes-20181213-104154.json

(The actual location will vary according to your nexus configuration)

Within the JSON report, there are details of each Blobstore and each Repository that uses the Blobstore. For example, the output below shows two Blobstores, each having a single Repository:

{
"blobstore1": {
"repositories": {
"repositoryA": {
"reclaimableBytes": 0,
"totalBytes": 4173387
}
},
"totalBlobStoreBytes": 4173387,
"totalReclaimableBytes": 0,
"totalRepoNameMissingCount": 0
},
"blobstore2": {
"repositories": {
"repositoryB": {
"reclaimableBytes": 0,
"totalBytes": 1397598
}
},
"totalBlobStoreBytes": 1397598,
"totalReclaimableBytes": 0,
"totalRepoNameMissingCount": 0
}
}

For each Repository, totalbytes indicates how much space is being used and reclaimableBytes indicates how much space may be reclaimed by running the Compact Blob Store maintenance task.

For each Blobstore, all of the Repository entries are aggregated. totalRepoNameMissingCount will display how many assets within the Blobstore are associated with a Repository that no longer exists.

The report will also include Repositories that are empty.

 

Finding the Largest Blobs Within a Blobstore

1. Using a groovy script from the command line.

The groovy script below will find the largest blobs inside a blobstore directory. Execute the script from the command line making sure dir_name points to the correct path for your sonatype-work directory. The output contains a list of blobs larger than 100M sorted by size.

long min_size = 100000000
String dir_name = '/opt/Nexus/sonatype-work/nexus3'

def ant = new AntBuilder()
def scanner = ant.fileScanner {
  fileset(dir: dir_name) {
    include(name: '**/blobs/**/*.properties')
    exclude(name: '**/metadata.properties')
    exclude(name: '**/*metrics.properties')
    exclude(name: '**/tmp')
  }
}
def results = [:].withDefault { 0 }
scanner.each { File file ->
  def properties = new Properties()
  file.withInputStream { is ->
    properties.load(is)
  }
  long prop_size = properties.size as long;
  if (prop_size > min_size) {
    results.put(properties['@BlobStore.blob-name'], prop_size)
  }
}
def sorted = results.sort { a, b -> b.value <=> a.value }

sorted.each{ k, v -> println "${k}:${v}" }

2. Running a repository manager task.

The groovy script below can be executed as a task in Nexus Repository Manager. In the Administration pane select System > Tasks. Create a new Execute Script task. Set the Language to groovy and task frequency to Manual, using the Source below. When you execute the task the output logged to the nexus.log will list assets larger than min_size in descending size order for each repository.

import org.sonatype.nexus.repository.storage.StorageFacet
import org.sonatype.nexus.repository.Repository
import org.sonatype.nexus.repository.storage.Asset
import groovy.json.JsonOutput

long min_size = 100000000

repository.repositoryManager.browse().each { Repository repo ->
    StorageFacet storageFacet = repo.facet(StorageFacet)
    def tx = storageFacet.txSupplier().get()
    def results = [:].withDefault { 0 }
	try {
    	tx.begin()    
    
    	tx.browseAssets(tx.findBucket(repo)).each { Asset asset ->
      	if (asset.size() > min_size) {
        	results.put(asset.name(),asset.size())
      	}
   	 }
	} finally {
    	tx.close()
    }
    def sorted = results.sort { a, b -> b.value <=> a.value }
    log.info(JsonOutput.prettyPrint(JsonOutput.toJson(sorted)))
}
Have more questions? Submit a request

0 Comments

Article is closed for comments.