ce_timing.html



1. Measure the digout command using logic analyser. Used t4_app to do some DIGIOUT command between
   a  parallel sequence output.

        
CE_PARAM_UPDWR_LO + CE_CMD_UPDWR
reps cd.n  time  each.
---------------------------------
0    12   9.08   
1    14   9.12  .020   
2    16   9.23  .037
3    18   9.31  .038
4    20   9.43  .043
5    22   9.47  .039 
10   32   9.87  .039
100 212  17.07  .039

CE_PARAM_UPDWR_HI + CE_PARAM_UPDWR_LO + CE_CMD_UPDWR
----------------------------------------------------
  0  12   9.07
  1  15   9.18  .036
  2  18   9.29  .036
  3  21   9.40  .036
  4  24   9.51  .036
  5  27   9.62  .036
 10  42  10.18. .037
100  312 20.07  .037 

Hmmm.... takes longer but, is doint HI+LO+UPDWR is quicker per operation?

How about just CE_CMD_UPDWR 
----------------------------------------------------
  0  12   9.04  
  5  17   9.32  .056
 10  22   9.58  .054
100 112  14.07  .050 
200 212  19.07  .050

So..
CE_CMD_UPDWR  100 took  5.03 : 0.050 usec each
LO+UPDWR      100 took  7.99 : 0.030 usec each
HI+O+UPDWR    100 took 11.00 : 0.030 usec each


2. Testing al_ck_ce_cmd_delay()


Using tick() to time delays:

	CMD_DELAY with  9999 ITER takes: 1000106 usec
	CMD_DELAY with 19999 ITER takes: 2000206 usec (what is wrong).

	t2 1000    =  0.001 s  - CMD_DELAY |      9              =   0.001001
	t2 5000    =  0.005 s  - CMD_DELAY |     49              =   0.005001
	t2 100000  =  0.1   s  - CMD_DELAY |    999              =   0.100011
	t2 500000  =  0.5   s  - CMD_DELAY |   4999              =   0.500051
	t2 1000000  = 1     s  - CMD_DELAY |   9999              =   1.000101
	t2 5000000  = 5     s  - CMD_DELAY |  19999,19999,9999   =   5.000501
	t2 10000000 = 10    s  - CMD_DELAY |  19999 x 4          =  10.001001

	CMD_DELAY time is about 1.0001 times the requested time.

Measurement with logic analyser: 250Mhz
usec    measured    actual
   0        9.08   
   1       10.09      1.01
  10       19.09     10.01
 100      109.10    100.02
1000    1.01ms      1000.92
3000    3.01ms      3000.92

Measurement with logic analyser: 25Mhz

  usec    measured    actual
  1000     1.01ms                    
  2000     2.01ms       2.00
  5000     5.01ms       5.00
 10000    10.01ms      10.00
100000   100.02ms     100.01      (Logic analyer seeing the 1.0001 delay)
200000   200.03ms     200.02      


3. Measuring CE_CMD_PARSEQ 


Used t3_app(): t3 48 1 x 

reps    ce.n     goLogic     us_per_CMD_PARSEQ
   0      12       9.05           .033
   1      20       9.32           .030
   5      52      10.27           .030
  10      92      11.47           .030
  11     100      11.71           .030
  12     108      11.95           .031
  20     172      13.86           .030
  50     412      21.07           .030

1 CE_CMD_PARSEQ takes 30 ns.  
8 CE_CMD_PARSEQ takes 8*30= 240 ns.

How about PARAM_VIDE_SEQ?
reps    ce.n     goLogic     us_per_CMD_PARSEQ
   0       4       9.07                    
  10     164      13.87           .030
  20     324      18.67           .030

So, CE_PARAM_VIDSEQ_PGx_x also take 30 ns.
    PG3/PG4 program is: 8*30 + 8*30 = 480 ns.


5. CE_CMD_SHUTTEROPEN/CE_CMD_SHUTTERCLOS timing


open/close:
reps  ce.n     usec      CMD
   0    12     9.07 
  10    32     9.67      .030
  50   112    12.07      .030
 100   212    15.07      .030
   
CE_CMD_SHUTTEROPEN takes 30ns.
CE_CMD_SHUTTERCLOS takes 30ns.


5. Fillin' the fifo 

   ce.n   usec.     usec/n
   20      3         .150
   28      4         .148
   52      7         .134
   92     12         .130
  172     22         .127
  412     54         .131

Take CPU about 130ns to copy a command into the FIFO.
So a DIGOUT at 260ns takes 50% CPU?


6. ADC notes 


Time for CD_PARAM_ADC

rep   ns  per_CE_CMD
 0    30  
 1    60  30 
 5   180  30
10   330  30

ce_put(ce, CE_PARAM_ADC| adc) takes 30 ns.



7. CATP transfers rates

Using t9_app() I created data in the controller, then used grasp_get_image() to transfer to sgir.
transfer rates:
 sg.x 2048 2048 2

	2048x2048x32bits: 16MB
	grasp_get_image() 16777216 bytes in 0.364 sec. 46.1 MB/s.  368.5 Mb/s
	grasp_get_image() 16777216 bytes in 0.362 sec. 46.3 MB/s.  370.6 Mb/s
	grasp_get_image() 16777216 bytes in 0.365 sec. 46.0 MB/s.  367.8 Mb/s
	grasp_get_image() 16777216 bytes in 0.362 sec. 46.3 MB/s.  370.7 Mb/s
	grasp_get_image() 16777216 bytes in 0.241 sec. 69.7 MB/s.  557.6 Mb/s
	grasp_get_image() 16777216 bytes in 0.241 sec. 69.6 MB/s.  556.8 Mb/s
	grasp_get_image() 16777216 bytes in 0.569 sec. 29.5 MB/s.  236.0 Mb/s

	2048x1024x32bits: 8MB
	grasp_get_image() 8388608 bytes in 0.118 sec. 70.9 MB/s.  567.1 Mb/s
	grasp_get_image() 8388608 bytes in 0.119 sec. 70.4 MB/s.  563.4 Mb/s
	grasp_get_image() 8388608 bytes in 0.122 sec. 68.9 MB/s.  551.1 Mb/s
	grasp_get_image() 8388608 bytes in 0.119 sec. 70.5 MB/s.  563.6 Mb/s

	1024x1024x32bits: 4MB
	grasp_get_image() 4194304 bytes in 0.060 sec. 70.2 MB/s.  561.9 Mb/s
	grasp_get_image() 4194304 bytes in 0.060 sec. 70.2 MB/s.  561.8 Mb/s
	grasp_get_image() 4194304 bytes in 0.060 sec. 70.2 MB/s.  561.9 Mb/s

	512x512x32bits:
	grasp_get_image() 1048576 bytes in 0.016 sec. 67.1 MB/s.  537.2 Mb/s
	grasp_get_image() 1048576 bytes in 0.016 sec. 67.2 MB/s.  537.9 Mb/s
	grasp_get_image() 1048576 bytes in 0.016 sec. 67.1 MB/s.  536.5 Mb/s

	Time to run t9_app in controller:
	sg.x 2048 2048 0 = 0.171 sec
	sg.x 2048 2048 1 = 1.184 sec   // does division
	sg.x 2048 2048 2 = 0.255 sec
	Note: about 0.010 sec in the above time is used for catp memory allocation.

	Hmm... 2Hz if we clock + sample + transfer?
			 maybe we should clock + add? Then transfer?

July 8, 2010: Re-do above tests with just timing 
the grasp_get_image() - 1st time I included calloc() in the time for data transfers.

	2048x2048x32bits: 16MB
	grasp_get_image() 16777216 bytes in 0.238 sec. 70.4 MB/s.  563.3 Mb/s
	grasp_get_image() 16777216 bytes in 0.238 sec. 70.5 MB/s.  563.9 Mb/s
	grasp_get_image() 16777216 bytes in 0.238 sec. 70.5 MB/s.  563.7 Mb/s
	grasp_get_image() 16777216 bytes in 0.238 sec. 70.5 MB/s.  564.3 Mb/s
	grasp_get_image() 16777216 bytes in 0.238 sec. 70.5 MB/s.  564.0 Mb/s

	2048x1024x32bits: 8MB
	grasp_get_image() 8388608 bytes in 0.120 sec. 70.2 MB/s.  561.3 Mb/s
	grasp_get_image() 8388608 bytes in 0.120 sec. 70.1 MB/s.  560.9 Mb/s
	grasp_get_image() 8388608 bytes in 0.120 sec. 70.1 MB/s.  561.1 Mb/s
	grasp_get_image() 8388608 bytes in 0.120 sec. 70.1 MB/s.  560.7 Mb/s

	512x512x32bits: 1MB.
	grasp_get_image() 1048576 bytes in 0.016 sec. 66.8 MB/s.  534.3 Mb/s
	grasp_get_image() 1048576 bytes in 0.016 sec. 66.8 MB/s.  534.2 Mb/s
	grasp_get_image() 1048576 bytes in 0.017 sec. 61.8 MB/s.  494.3 Mb/s
	grasp_get_image() 1048576 bytes in 0.016 sec. 66.6 MB/s.  532.8 Mb/s
	grasp_get_image() 1048576 bytes in 0.016 sec. 66.7 MB/s.  533.4 Mb/s

Aug 25, 2010: SG tells IRTF we must do streaming some benchmark while
calling sockserv_run in t9_app()

	512x512x16bits: 0.5MB
	grasp_get_image() 524288 bytes in 0.011 sec. 49.7 MB/s.  397.4 Mb/s

	2048x2048x16bits: 8MB
	grasp_get_image() 8388608 bytes in 0.159 sec. 52.7 MB/s.  421.9 Mb/s

	sg.x  512 512  1:
	  0.289 ms to allocate pool.
	  6.787 ms to fill data.
	sg.x 2048 2048 1:
	  4.448 ms to allocate pool.
	107.058 ms to fill data.

	Function to estimate catp pool seutp time based on number of pixel:
	   ns  = 1.0576883951822917 * n + 11733;
   for integer based math, use:
	   n = w*h;
	   usec = 12 + n/1000 + n/17000;

	transfer in general take a hit (70MB/s - > 50MB/s).

   Serial Fill & Transfer:
		sg.x 2048 2048 0 8 
		0. e_usec1=4453 e_sec2=105059 e_sec3=333000 
		----
		Rep 2/8. esec=0.243690 
		grasp_get_image() 8388608 bytes in 0.161 sec. 52.0 MB/s.  416.4 Mb/s
		send to dv () 0.096 sec.  13654 13654 13654.0 


2010-09-07 - Cross trigger measurement

Measured CE_CMD_NOTDONE1  command: 40 ns.
Measured CE_CMD_WAITDONE1 command: 40 ns.
Measured sync time of dev0&1 CE at 20 ns  (measure on logic analysis 250Mhz on FPG header).
 
Without XTrigger, 2 consecutive ce_go() are about ~380ns apart.


2010-09-08 - t9_app and catp

Or why tony thinks it's better to co-add in RAM...

t9_app is uboot to that generate data in catp pools. It takes about
109 msec to run: 4.5 for catp setup + 105 for generating data.

Ave time to catp transfer after generating data: 0.191 sec or 43.91MB/s, 351.35MB/s.

With t9_app & 333 ms loop (3Hz, 32% busy):       0.211 sec or 39.75MB/s, 318.05MB/s.
Able to do 1000 reps OK.

With t9_app & 250 ms loop (4Hz, 43% busy):       0.232 sec or 36.15MB/s, 289.26MB/s.
Able to do 1000 reps OK.

With t9_app & 225 ms loop (4.4Hz, 48% busy):     0.220 sec or 38.13MB/s, 305.04MB/s.
Able to do   91 reps before failing.

With t9_app & 200 ms loop (5Hz, 54% busy):       0.230 sec or 36.47MB/s, 291.77MB/s.
Able to do   29 reps before failing.

Reference:
	1009/sgx200
	1009/sgx225
	1009/sgx250
	1009/sgx333

------------------------------------------------------------------------